CHAPTER 9 Summarizing and Graphing Your Data 117
from lowest to highest mmHg, you can list them as 84, 84, 89, 91, 110, 114,
and 116. There are seven values, and 91 is the fourth of the seven sorted values, so
that is the median. Three DBPs in the sample are smaller than 91 mmHg, and
three are larger than 91 mmHg. If you have an even number of values, the median
is the average of the two middle values. So imagine that you add a value of 118
mmHg to the top of your list, so you now have eight values. To get the median, you
would make an average of the fourth and fifth value, which would be (91 + 110)/2
= 100.5 mmHg (don’t be thrown off by the 0.5).
Statisticians often say that they prefer the median to the mean because the median
is much less strongly influenced by extreme outliers than the mean. For example,
if the largest value for DBP had been very high — such as 150 mmHg instead of
116 mmHg — the mean would have jumped from 98.3 mmHg up to 103.1 mmHg.
But in the same case, the median would have remained unchanged at 91. Here’s an
even more extreme example: If a multibillionaire were to move into a certain
state, the mean family net worth in that state might rise by hundreds of dollars,
but the median family net worth would probably rise by only a few cents (if it were
to rise at all). This is why you often hear the median rather than mean income in
reports comparing income across regions.
Mode
The mode of a sample of numbers is the most frequently occurring value in the
sample. One way to remember this is to consider that mode means fashion in
French, so the mode is the most popular value in the data set. But the mode has
several issues when it comes to summarizing the centrality of observed values for
continuous numerical variables. Often there are no exact duplicates, so there is no
mode. If there are any exact duplicates, they usually are not in the center of the
data. And if there is more than one value that is duplicated the same number of
times, you will have more than one mode.
So the mode is not a good summary statistic for sampled data. But it’s useful for
characterizing a population distribution, because it’s the value where the peak of
the distribution function occurs. Some distribution functions can have two peaks
(a bimodal distribution), as shown earlier in Figure 9-2d, indicating two distinct
subpopulations, such as the distribution of age of death from influenza in many
populations, where we see a mode in young children, and another mode in older
adults.
Considering some other “means” to
measure central tendency
Several other kinds of means are useful measures of central tendency in certain
circumstances. They’re called means because they all calculated using the same